R Beginners Course

Spatial and temporal statistics

Objective and contents

Objective and contents

Objective: to perform basic spatio-temporal statistics using gridded and vector data

  • Raster statistics
    • Minimum, maximum, mean, standard deviation, sum
    • Frequency of cells in a raster
    • Writing derived values as a time series
  • Example: Raster statistics over an area
    • Calculating mean P and Ea over an area
    • Analysing P minus Ea patterns

Raster statistics

Raster Statistics

First, we will start with some basic raster statistics. For this, we need to load the terra package in R:

library(terra)

Please load the following raster of the global monthly average temperatures for the month of February.

temp_feb_path <- "C:/Users/Temperature_Data/wc2.1_2.5m_tavg_02.tif"
temp_feb      <- rast(temp_feb_path)

Note that the data are provided in degrees Celsius.

Raster Statistics

Let’s start by looking at the data:

plot(temp_feb)

Raster Statistics

hist(temp_feb)
## Warning: [hist] a sample of3% of the cells was used (of which 66% was NA)

Minimum and Maximum Values

We can extract the minimum and maximum values of a raster:

minmax(temp_feb)
##      wc2.1_2.5m_tavg_02
## [1,]             -44.80
## [2,]              32.96

Minimum and Maximum Values

If we just want to check the minimum and maximum values (i.e., not save the number or use it in a calculation), these values come up when you type the name of the raster and hit enter.

temp_feb
## class       : SpatRaster 
## dimensions  : 4320, 8640, 1  (nrow, ncol, nlyr)
## resolution  : 0.04166667, 0.04166667  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : wc2.1_2.5m_tavg_02.tif 
## name        : wc2.1_2.5m_tavg_02 
## min value   :              -44.8 
## max value   :              32.96

Grid cells with Specific Attributes

To find out the number of cells with a certain value, we can use the freq function.

freq(x, digits = 0, value = NULL, ...)

x: raster layer

digits: used to round the grid-cell values

value: optional single value to only count the number of cells with that value

Grid cells with Specific Attributes

If we don’t specify a value or a rounding condition:

freq(temp_feb)
##       layer value  count
##  [1,]     1   -45  17478
##  [2,]     1   -44  83878
##  [3,]     1   -43 117982
##  [4,]     1   -42 102845
##  [5,]     1   -41 106076
##  [6,]     1   -40 152928
##  [7,]     1   -39 154018
##  [8,]     1   -38 151494
##  [9,]     1   -37 226850
## [10,]     1   -36 213282
## [11,]     1   -35 232837
## [12,]     1   -34 293376
## [13,]     1   -33 273896
## [14,]     1   -32 275022
## [15,]     1   -31 241752
## [16,]     1   -30 213349
## [17,]     1   -29 250686
## [18,]     1   -28 211607
## [19,]     1   -27 203796
## [20,]     1   -26 198331
## [21,]     1   -25 219682
## [22,]     1   -24 205186
## [23,]     1   -23 215089
## [24,]     1   -22 186505
## [25,]     1   -21 182648
## [26,]     1   -20 197982
## [27,]     1   -19 194459
## [28,]     1   -18 210066
## [29,]     1   -17 241606
## [30,]     1   -16 259705
## [31,]     1   -15 223245
## [32,]     1   -14 209189
## [33,]     1   -13 211503
## [34,]     1   -12 191229
## [35,]     1   -11 157554
## [36,]     1   -10 149361
## [37,]     1    -9 136770
## [38,]     1    -8 130283
## [39,]     1    -7 125210
## [40,]     1    -6 121743
## [41,]     1    -5 110963
## [42,]     1    -4 109459
## [43,]     1    -3 116599
## [44,]     1    -2 111883
## [45,]     1    -1  94808
## [46,]     1     0 117162
## [47,]     1     1  94409
## [48,]     1     2  84865
## [49,]     1     3  77204
## [50,]     1     4  84120
## [51,]     1     5  83697
## [52,]     1     6  80351
## [53,]     1     7  74984
## [54,]     1     8  72369
## [55,]     1     9  73109
## [56,]     1    10  73799
## [57,]     1    11  67978
## [58,]     1    12  84051
## [59,]     1    13 101461
## [60,]     1    14 119884
## [61,]     1    15 110814
## [62,]     1    16  88892
## [63,]     1    17  85445
## [64,]     1    18 104649
## [65,]     1    19 121411
## [66,]     1    20 140826
## [67,]     1    21 172560
## [68,]     1    22 214183
## [69,]     1    23 237690
## [70,]     1    24 287524
## [71,]     1    25 399249
## [72,]     1    26 463549
## [73,]     1    27 335760
## [74,]     1    28 192903
## [75,]     1    29 104746
## [76,]     1    30  82156
## [77,]     1    31  50880
## [78,]     1    32  15303
## [79,]     1    33    419

Grid cells with Specific Attributes

Now, let’s define only looking for NA cells:

freq(temp_feb, value = NA)
##      layer value    count
## [1,]     1    NA 24792188

And let’s look for 20 degrees, but within rounding margin of 0.1 (note, that means only values from 19.95 to 20.05).

freq(temp_feb, digits = 1, value = 20)
##      layer value count
## [1,]     1    20 13606

Writing derived values as time series

Let’s use what we just learnt to write a time series for global monthly mean temperature over the 12 months for which we have data (over land areas, because oceans are stored as NA).

We will start by loading the rasters and creating a raster stack with all 12 rasters:

t_path <- "C:/Users/Temperature"
t_files <- list.files(t_path, full.names = TRUE)
t_stack <- rast(t_files)

Writing derived values as time series

Now we want to calculate the mean value in each layer:

mean_values <- global(t_stack, stat = mean, na.rm = TRUE)
print(mean_values)
##                          mean
## wc2.1_2.5m_tavg_01 -6.9901310
## wc2.1_2.5m_tavg_02 -8.2640013
## wc2.1_2.5m_tavg_03 -8.4594775
## wc2.1_2.5m_tavg_04 -6.3245406
## wc2.1_2.5m_tavg_05 -3.3703036
## wc2.1_2.5m_tavg_06 -0.6429807
## wc2.1_2.5m_tavg_07  0.1324651
## wc2.1_2.5m_tavg_08 -0.6111403
## wc2.1_2.5m_tavg_09 -2.2624299
## wc2.1_2.5m_tavg_10 -4.0777166
## wc2.1_2.5m_tavg_11 -5.3751544
## wc2.1_2.5m_tavg_12 -5.8202356

Writing derived values as time series

Question: Because of the projection system for the data in this example (WGS84), will the results be representative of the global mean temperature averages over land or will they be biased? If they are biased, in which way (i.e., are the results of our analysis higher or lower that the actual average temperature over land)?

Writing derived values as time series

Let’s plot the outputs.

plot(1:12, mean_values$mean, type = "l", xaxt = "n", xlab = "Month", ylab = "Temp [deg. C]")
axis(1, 1:12, month.abb)

xaxt = “n” is to not include labels on the x-axis. This is because we want to plot the month abbreviations (see next line) instead of a default 1 to 12.

month.abb is a predefined object in R that has the abbreviations for each month.

Writing derived values as time series

Writing derived values as time series

Finally, we want to export these values as a time series.

write.table(mean_values, "C:/User/Exported_TS.csv", row.names = FALSE, sep = ",")

Summary of what the process

We have just shown you how to calculate the mean value (or another variable) for each time step. Then we have generated a time series for the 12 months with one value for each month-

Next step

Now, we want to calculate the mean annual temperature at each grid-cell. We will calculate the mean of the 12 values that correspond to the 12 months for every grid-cell.

Calculating mean values at each grid-cell

We will use the same temperature data that we used in the previous example. This time, instead of global, we can use the mean function.

mean_raster <- mean(t_stack)
plot(mean_raster)

Visualising NA Values in a Raster File

Sometimes it is useful to visualise the location of NA values, for this purpose we can use the colNA parameter from the plot function.

plot(mean_raster, colNA = "red")

Example: Raster Statistics over an Area (P minus Ea)

Raster Statistics over an Area

For this exercise, we want to look at precipitation and evapotranspiration patterns over Uganda. We have the following data:

  • The shapefile for the Nile Basin countries
  • Global monthly P from CHIRPS for 2018
  • Monthly Ea over Africa from SSEBop

We want to generate a line chart for mean monthly P and mean Ea over Uganda

Raster Statistics over an Area

Step 1: Load the required packages into R.

library(sf)
library(terra)

Step 2: Set the paths for the shapefile and the folders with the CHIRPS and SSEBop data.

dir <- "C:/Users/L5_Spatial_and_Temporal_Statistics_Data"
shape_path  <- file.path(dir, "Shapefile/Nile_Basin_Countries_GAUL2014_2.shp")
chirps_path <- file.path(dir, "CHIRPS")
ssebop_path <- file.path(dir, "SSEBop")

Raster Statistics over an Area

Step 3: Read the shapefile and extract the polygon corresponding to Uganda.

countries <- read_sf(shape_path)
uganda    <- countries[5,]
uganda <- vect(uganda)

Raster Statistics over an Area

plot(uganda)

Raster Statistics over an Area

Step 4: Create a stack of the monthly CHIRPS files.

chirps_files <- list.files(chirps_path, full.names = TRUE)
chirps       <- rast(chirps_files)
plot(chirps)

Raster Statistics over an Area

We want to calculate the mean monthly precipitation over Uganda. That means that we only want to calculate raster statistics within this shapefile. What is one way to do this?

  • Mask the raster to the shapefile (not clip!)
  • Calculate statistics using this masked raster

But, if we look for other options, we can see that there is a function that already performs this process for us: the extract function (in the terra package).

Raster Statistics over an Area

Step 5: Extract the monthly mean precipitation over Uganda.

ug_pmean <- terra::extract(chirps, uganda, fun = mean, na.rm = TRUE)

If we look at our output, we can see that it is stored as a matrix. The first value should be removed.

print(ug_pmean)
##   ID chirps_monthly2018.01 chirps_monthly2018.02 chirps_monthly2018.03
## 1  1              21.60957              46.26966               171.186
##   chirps_monthly2018.04 chirps_monthly2018.05 chirps_monthly2018.06
## 1              227.9121              187.1596              125.8768
##   chirps_monthly2018.07 chirps_monthly2018.08 chirps_monthly2018.09
## 1              62.84645              134.0909              103.6331
##   chirps_monthly2018.10 chirps_monthly2018.11 chirps_monthly2018.12
## 1              126.2652              86.03881              99.42722

Let’s convert the output to a numeric vector so the values are easier to handle.

ug_pmean <- as.numeric(ug_pmean)[-1]
print(ug_pmean)
##  [1]  21.60957  46.26966 171.18602 227.91208 187.15955 125.87678  62.84645
##  [8] 134.09085 103.63311 126.26518  86.03881  99.42722

Raster Statistics over an Area

Step 6: Repeat the same process for the Ea data: first create a raster stack and then extract the monthly mean Ea.

ssebop_files <- list.files(ssebop_path, full.names = TRUE)
ssebop       <- rast(ssebop_files)
plot(ssebop)

Raster Statistics over an Area

ug_eamean <- terra::extract(ssebop, uganda, fun = mean, na.rm = TRUE)
ug_eamean <- as.numeric(ug_eamean)[-1]
print(ug_eamean)
##  [1]  71.16544  50.92773  93.04707  95.88938  98.27621  93.29266  85.61035
##  [8]  87.66707  96.88887 100.74566  94.00632  91.06700

Raster Statistics over an Area

Step 7: Create a line chart for mean monthly P and mean Ea over Uganda.

First, we start by just plotting the mean monthly P.

plot(1:12, ug_pmean, type = "l", main = "Monthly Mean P and Ea for Uganda - 2018",
col = "blue", lwd = 2, xlab = "Month", ylab = "P and Ea [mm]", xaxt = "n")

Next, we can add the Ea using the lines command.

lines(1:12, ug_eamean, col = "darkred", lwd = 2)

Raster Statistics over an Area

Finally, we can include the month abbreviations on the x-axis, gridlines and a legend.

axis(1, 1:12, month.abb)
grid()
legend("topright", c("P", "Ea"), lty = 1, col = c("blue", "darkred"))

Raster Statistics over an Area

Raster Statistics over an Area

For the exercise shown in the lecture slides, we will generate three more outputs:

  1. Boxplot for monthly Ea over Uganda

  2. Raster spatial distribution of annual accumulated P minus Ea over Uganda

  3. Histogram of annual accumulated P minus Ea values over Uganda

Raster Statistics over an Area

To be able to complete our three extra objectives, we want to create masked monthly P and Ea rasters over Uganda.

Use the mask function to generate the monthly P and Ea rasters.

uganda_prec <- crop(chirps, uganda, snap = "OUT")
uganda_prec <- mask(uganda_prec, uganda)
uganda_ea <- crop(ssebop, uganda, snap = "OUT")
uganda_ea <- mask(uganda_ea, uganda)

Raster Statistics over an Area

plot(uganda_prec)

Raster Statistics over an Area

plot(uganda_ea)

Raster Statistics over an Area - a)

Generate a boxplot to show the dispersion of the Ea values for every month.

boxplot(uganda_ea, main = "Monthly Ea for Uganda - 2018", col = "red", xlab = "Month", ylab = "Ea [mm]", xaxt = "n")
## Warning: [boxplot] taking a sample of 1e+05 cells
axis(1, 1:12, month.abb)
grid()

Raster Statistics over an Area - b)

To calculate P minus Ea for 2018, we first must compute the annual accumulated P and Ea for Uganda.

Generate rasters for the annual accumulated P and Ea

annual_prec <- sum(uganda_prec)
annual_ea   <- sum(uganda_ea)

Raster Statistics over an Area - b)

plot(annual_prec, main = "Annual P for 2018")

Raster Statistics over an Area - b)

plot(annual_ea, main = "Annual Ea for 2018")

Raster Statistics over an Area - b)

Resample the P raster to the same spatial resolution as the Ea raster.

annual_prec <- resample(annual_prec, annual_ea, method = "bilinear")

Calculate annual accumulated P minus Ea over Uganda.

p_minus_ea <- annual_prec - annual_ea

Raster Statistics over an Area - b)

plot(p_minus_ea, main = "P - Ea for 2018")

Raster Statistics over an Area - c)

Plot a histogram of the distribution of annual accumulated P minus Ea values.

hist(p_minus_ea)

Plotting Shape Files and Rasters

Remember, if you are more comfortable plotting shape files and rasters using another programme (eg. ArgGIS, QGIS, Google Earth Engine), this can still easily be done. Just remember that you need to write your results first (i.e., save the outputs).